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AUTOMATIC COMPUTER MAPPING OF TERRAIN* 

by 

Harry W. Smedes 
U.S. Geological Survey 
Denver, Colorado 

ABSTRACT 

Computer processing of 17 wavelength bands of visible, reflective infrared, 
and thermal infrared scanner spectrometer data, and of three wavelength bands 
derived from color aerial film has resulted in successful automatic computer 
mapping of eight or more terrain classes in a Yellowstone National Park test 
site . 

The tests involved: l) supervised and 2) non-supervised computer programs; 

3) special preprocessing of the scanner data to reduce computer processing time 
and cost, and improve the accuracy; and 4) studies of the effectiveness of the 
proposed Earth Resources Technology Satellite (ERTS) data channels in the auto- 
matic mapping of the same terrain, based on simulations, using the same set of 
scanner data. 

The following terrain classes have been mapped with greater than 80 percent 
accuracy in a 12-square-mile area with 1,800 feet of relief: l) bedrock exposures, 

2) vegetated rock rubble, 3) talus, 4) glacial kame meadow, 5) glacial till 
meadow, 6) forest, 7) bog, and 8) water. In addition shadows of clouds and cliffs 
are depicted, but were greatly reduced by using preprocessing techniques. 


♦Publication authorized by the Director, U.S. Geological Survey 

Work done in cooperation with the National Aeronautics and Space Administration 
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AUTOMATIC COMPUTER MAPPING OP TERRAIN* 


Harry W. Smedes 
PURPOSE AND SCOPE 


For several years now there have been discussions and expressions of con- 
cern about the need to examine vast areas of the earth's surface, the advantages 
of high-altitude aircraft and satellite -borne remote sensors to gather the 
needed data, and at the same time concern about the immense quantity of data 
that is needed and would become available . Handling these data will require 
automatic processing by computer — not to make the final and only decisions of 
classification, but to perform one or more of the following three basically 
different tasks : 

1) to perform the first-approximation rough interpreting, calling attention 
to special places that warrant examination by a human interpreter; 

2) to enable us to extend our interpretation far beyond the range of human 
vision into the reflective infrared and thermal infrared, at the same time 
combining and integrating the data from many different parts of this ex- 
panded spectrum--something which cannot be done from the study of any 
single image; and 

3) to enable us to extract additional information by either amplifying very 
small differences in radiance which are on or even below the limit of 
visual recognition, or by portraying broad ranges in radiance uniformily 
as a single value in order to determine or clarify relations previously 
obscured by mottled radiance. 

For some of these operations a human can do a better job of interpreting, 
but the computer can do it faster. In this case, computer processing is largely 
a matter of data compression. However, some others of these operations cannot 
be done directly by a human interpreting some image or images. In these cases 
the computer processing enables man to extend his capability and perform tasks 
not otherwise possible. 

It is for these reasons that the U.S. Geological Survey engaged in this 
study of automatic data processing by computer, that includes: 

A. Testing the suitability of existing sensors and computer software; 

B. Determining how many and what kinds of natural and manmade terrain classes 
can be satisfactorily classified in this particular climatic region; 

C. Simulating the spectral response of the proposed Earth Resources Tech- 
nology Satellite (ERTS) sensors. 

This report summarizes the current status of studies of computer processing 
of airborne multispectral data and color photographs, the success of automatic 
recognition and mapping of the distribution of eight or more different terrain 
classes, and the effectiveness of the proposed ERTS data channels in the auto- 
matic recognition and mapping of the same terrain classes based on simulations, 
using the same set of scanner data. 

*Publication authorized by Director, U.S. Geological Survey 
Work done in cooperation with the National Aeronautics and Space Administration 
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This study involves the data from one flight over a test area of about 12 
square miles in a region of moderate relief (1,800 feet) comprising a wide 
variety of terrain types (Figures 1 and 2). 

The data were acquired and processed in analog and digital form by the 
Institute of Science and Technology of the University of Michigan; and were 
processed in digital form by the Laboratory for Applications in Remote Sensing 
(LARS) at Purdue University; the Center for Research (CRES) at the University 
of Kansas; and EG&G, Inc. of Bedford, Massachusetts. The project also involved 
limited studies of spatial pattern recognition using optical lasers, and image 
enhancement by electronic and optical methods, but these studies are not 
described in this report. 


COLLECTION OF THE DATA 


A multispectral survey was made of selected test areas in Yellowstone 
National Park during flights by the University of Michigan in September 1967* 
on a NASA-sponsored contract to the U.S. Geological Survey. 

The University of Michigan 12-channel scanner-spectrometer in the 0.4 to 
1.0/t<m range (Table i) provided the principal data for the computer processing 
described in this report. In addition, two scanner systems recorded a total of 
five channels of reflective and thermal infrared data in the region from 1.0 to 
l4^m. 

Photographs taken at the same time the scanner data were acquired provide 
important supplements to the control data. These photographs consist of color, 
color infrared, black and white panchromatic, and black and white infrared film 
on board the aircraft, and color film from stations on the ground. Special 
computer processing was performed on some of the color aerial film for comparison 
with scanner data. 

A simplified diagram of the Michigan scanner-spectrometer is shown in 
Figure 3* As the aircraft flies over the test area, the ground surface is 
scanned in overlapping strips by successive sweeps as a mirror is rotated at 
about 3>600 rpm. The radiant energy from the earth's surface is reflected off 
the rotating mirror and focused, by other mirrors (M, Figure 3) onto the slit of 
a prism spectrometer, thus refracting the rays into a wavelength spectrum. 

Fiber optics placed at appropriate places lead to photomultiplier tubes 
which measure the amount of radiant energy received in each of 12 overlapping 
bands or channels of this spectrum from 0.4 to 1.0/fm (visible violet to reflec- 
tive infrared). This energy, which is now a voltage, is fed to a multitrack 
tape recorder where each of the 12 channels is recorded as a separate synchro- 
nized signal on magnetic tape. Similar, separate scanners recorded the infrared 
part of the spectrum from 1 to l4^m (see Table I). 
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Table I . Wavelength bands of University of Michigan multispectral system . 


Channel 

Wavelength 

Channel 

Wavelength 

number 

band 

number 

band 


m 


A\ m 


SCANNER 

NO. 1 


1 

0.40-0.44 

7 

0.55-0.58 

2 

.44- .46 

8 

.58- .62 

3 

.46- .48 

9 

.62- .66 

4 

.48- .50 

10 

.66- .72 

5 

•50- .52 

11 

CO 

0 

6 

.52- .55 

12 

.80-1.00 


SCANNER 

NO. 2 


1 

1.0 -1.4 

3 

3.0 - 4.1 

2 

2.0 -2.6 

4 

*4.5 - 5-5 


SCANNER 

NO. 3 



*8.0-14.0 


*Denotes thermal infrared channels ; others are reflective 
TERRAIN CLASSES MAPPED 


The eight terrain classes discussed on the following pages were selected 
arbitrarily during field study and the early part of computer processing. They 
were selected not on the basis of composition or genesis, as we traditionally 
do in the course of geologic mapping, but on the basis of their overall surface 
color and radiance (brightness) inasmuch as that is what the sensor was 
recording. 

For example, geologists are more interested in the areal distribution of a 
sand and gravel unit, such as glacial till, than in the distribution of forest. 
Conventional maps would show the extent of till regardless of whether it was the 
site of a meadow or was covered with dense forest. The terrain classes of this 
study necessarily show the unforested till as one class (till) and the forested 
till as a different class (forest). In fact, all forested terrain, regardless 
of underlying rock or soil unit, is shown as a single class (forest). 

Initial processing disclosed that at least 13 classes could be separated. 
Several of these were subunits which have been combined to make the display 
shown in Figure 25. The following is a brief description of the nine classes 
(including shadows) mapped. 
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1. BEDROCK EXPOSURES 

This class (Figure 4) consists of bare bedrock exposed by glacial and 
stream erosion and mantled by minor amounts of loose rubble. These are 
unvegetated except for lichens and sparse tufts of dry grass, and have 
high reflectance in nearly all channels. 

2. TALUS 

This class includes blockfields, talus, and talus flows of basalt lava 
flows, volcanic tuff, and gneiss, formed by frost-riving and solifluction 
from outcrops. These are blocky and well -drained deposits; trees are 
widely spaced or absent (Figure 5). Blocks generally are covered with 
dark-gray lichens (Figure 6). The blocks range from a few centimeters to 
about 1 meter in diameter; most are larger than 10 centimeters. The slopes 
range widely, from 35° -45° at the head, to 5° or less at the toe. In places, 
a basin or trough lies just inside the distal margin of talus flows. 

3- VEGETATED ROCK RUBBLE 

This class consists of locally derived angular rubble, frost-riven from 
basalt lavas, volcanic tuff and breccia, and gneiss. Grasses, lichens, 
evergreen seedlings, and mosses now cover more than three -fourths of the 
surface underlain by this debris (Figures 7 and 8). Blocks range in 
diameter from less than 1 centimeter to about 1 meter, and occur on slopes 
of from 0° to about 25 ° . 

4. GLACIAL KAME MEADOW 

These are meadows underlain by sand and gravel, and mantled by sandy silt 
(Figures 9 and 10). The deposits are well -drained and are vegetated by 
grass and sagebrush. About one-fourth of the area of this class is exposed 
mineral soil. Deer and elk manure locally covers as much as one -fourth 
the surface area. 

5. GLACIAL TILL MEADOW 

This class consists of meadow areas underlain by glacial till. These are 
grassland and sagebrush areas (largely dormant at time of flight) with 
mineral soil exposed over about one-fifth of the area (Figures 11, 12, and 
13). Mineral soil consists of mixtures of silty to bouldery debris. Deer 
and elk manure locally is abundant in these meadows. 

6 . FOREST 

Depicted here are Douglas Fir and lodgepole forest (see Figure 5). Local 
clusters of deciduous trees were recognized separately, but combined with 
evergreens in the displays . 

7. BOG 

These are moist areas supporting tall lush growth of sedges and grasses. 

Bogs are rather abundant because of glacial scour and derangement of 
drainages . 


350 


8. WATER 

The Yellowstone River and Floating Island Lake are present in the test 
area. Phantom Lake was dry at the time of flight, and was considered 
therefore as bog rather than water. 

9. SHADOWS 

Cloud shadows are near west and south-central margins of the test area, 
and deep shade occurs at base of north-facing cliffs and along north edge 
of forest areas. 

DATA PROCESSING BY ANALOG COMPUTER 

This section of the report, dealing with analog processing, was conducted 
at the University of Michigan. 

Any given channel of magnetic tape data can be reproduced by photographing 
a cathode-ray tube video (C-scope) presentation of the tape data (Figure l4). 

By changing the gain and amplitude, and thresholding out certain upper and (or) 
lower limits, different levels of radiance can be enhanced. Quantizing and con- 
touring of thermal infrared data are examples of this technique. 

For example, the continuous curve of image density versus log of exposure 
(Figure 15) can be broken electronically into discrete steps of variable width — 
all densities within each step being displayed in analog as a single density. 
Examples of density slicing of this sort are shown for thermal infrared data in 
Figure l6. The continuous -tone image is on the top, the n-level density-sliced 
equivalent in the middle. Electronic triggering at changes of density steps can 
produce a "spike" which can be displayed as a thermal contour, shown on the 
bottom on Figure l6. Ground temperature measurements or other control data can 
be used to convert this into a quantitative thermal contour map (the Michigan 
thermal scanner has internal calibrations, so that no ground control is required). 

If each density slice is reproduced separately, copied on a colored trans- 
parent film and then the films stacked together, the result is a color-coded 
quantized thermal map (Figure 17)* 

Each of the 12 reflective and 5 thermal channels can be printed as video 
images comparable to that of Figure l4. These prints would constitute 17-channel 
multiband imagery. These images contain differences in tone (density) — hence, 
information — that is on or even below the limit of visual recognition, but that 
can be amplified or enhanced and made visible by electronic means . However, now 
that the data are recorded as signals on magnetic tape, they can easily be pro- 
cessed electronically in several ways to enhance selected features and to deter- 
mine the statistical parameters of the spectral radiance (reflectance or emittance' 
of each class of material in the scene. This was done in analog form for data 
from the 12-channel scanner, that is, from the visible violet to the near infrared, 
or about 0.4 to 1.0/^m. 
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Specific targets in the scene can be selected. By electronically measuring 
the mean and standard deviation of the signals in each of the 12 channels, spec- 
tral signatures and covariance functions can be obtained for each target class, 
and the optimum channels for separating each object class from all other classes 
can be determined in the computer by spectral -matching or by maximum-likelihood 
statistical decision rules. 

For example (Figure 18), radiance (a voltage now) is measured for each of 
the channels for water (w) and forest (F). A vertical line two standard devia- 
tions long, centered about the mean radiance, is shown for each of the 12 
channels . 

Note that these two classes overlap only in the reflective IR part of the 
spectrum (right side). Other materials overlap these two and each other in var- 
ious places; some have dissimilar and others have closely similar spectral re- 
flectance. 

The radiance in channel 1 can be compared to that in channel 2 for each 
class of material. The distribution of this spectral covariance data might have 
the form shown diagrammatically in Figure 19, where each cluster represents a 
different material. HI and HP rppresent radiance in channels 1 and 2, respec- 
tively. A-D represent classes of material. 

On a frequency diagram the covariance radiance data may appear like that 
shown in Figure 20, where the seemingly topographic surface is the surface that 
bounds the distribution of data points. HI and R2 are radiance in channels 1 
and 2, as before; f is the frequency of distribution of data. 

This surface is topologically similar to, and can be considered as, a prob- 
ability diagram. Each class (A-D) has its own peak value and falls off in all 
directions, generally in Gaussian fashion. 

Statistically, if radiance values in channels 1 and 2 are r. and r_, they 
fall under the peak of class A and should be classified as belonging to class A. 
They have high probability of belonging to that class. But values that fall in 
the regions of low relief are highly questionable in terms of which class they 
belong to. They have low probability values. 

Part of the computer processing involves maximum-likelihood theory to estab- 
lish what plane or contour level should be applied as a threshold limit — for in- 
stance outside the area of intersection of the "topographic" surface and the 
vertical cylinder (shaded) of Figure 20. Any value falling outside that area 
would be rejected for that class by the computer. For simplicity, the threshold 
limit (cylinder) is shown only for class A in Figure 20. The "contour" at which 
the thresholds are set for other classes may be different for each class, depend- 
ing on the distribution of data (shape of the probability surface) and overlap 
of different classes. 
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The peak positions and the entire "topography" of this surface would be 
different in a plot of channel 1 vs 3, 1 vs 4 . . . 7 vs 9, 7 vs 10 . . . etc. 
The computer compares the radiance in channel 1 with that in 2, 1 vs 3> 1 vs 4 
. . . 7 vs 9; 7 vs 10 . . . etc. — until all 144 combinations of the 12 channels 
have been computed. From this data, the complex 12 X 12 covariance matrix 
function is computed and stored for making decisions of classification. These 
data represent the multispectral characteristics or "signatures" of each desig- 
nated terrain class. A diagram cannot be drawn to illustrate this 12-dimension- 
al space — it exists only in a mathematical sense. That is why the computer is 
needed . 

After the spectral distribution and the covariance functions are deter- 
mined for all classes sought, the entire tape of the traverse can be run through 
and the computer instructed to recognize and show only areas whose spectral re- 
flectance matches that of one class, for example FOREST. A photograph of the 
cathode-ray tube shows the distribution of all areas recognized as FOREST by 
this spectral matching technique (Figure 21 ). 

Separate runs for recognition of other terrain classes can be printed in 
different colors and overlayed, resulting in a sandwich which is a colored map 
presentation of the data. 

Although there are the advantages of real-time mapping of terrain units by 
having the analog computer in the aircraft during acquisition of data, for most 
scientific applications it has proven more feasible and more accurate to use 
digital programs to determine optimum channels and threshold levels and to feed 
this data back to the analog computer for the actual display and mapping in 
analog form. 


DATA PROCESSING BY DIGITAL COMPUTER 

Three basically different digital computer processing studies or tests 
were conducted. These involved: 

1. Supervised programs using the scanner data 

a. without preprocessing 

b. with preprocessing of two different types 

2. Non-supervised programs using the scanner data and clustering techniques 

3. Non-supervised programs using the aerial color film and clustering 
techniques 


SUPERVISED PROGRAMS 

Although the digital and computer programs used at Purdue and at Michigan 
differ in detail, they are closely similar. The Purdue program was used without 
preprocessing of data; the Michigan programs were used to test different pre- 
processing techniques and to closely simulate the spectral response of the ERTS 
data channels . 
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TESTS WITHOUT PREPROCESSING 

In contrast to processing the data in analog form such as that in Figure 
l4, they can be processed in digital form by making a digitized copy of the 
original magnetic tape. This is the procedure used by the Laboratory for Appli- 
cations in Remote Sensing (LARS), Purdue University. This section of the report 
will discuss the Purdue method of handling multispectral scanner data, and pre- 
liminary results obtained on a section of one flight-line of the Yellowstone 
Park data. 

This particular run was digitized in such a manner that, on the average, 
there was neither overlap nor underlap of adjacent scan lines (Figure 3). The 
scanner resolution is three milliradians, and the aircraft altitude was about 
6,000 feet above terrain. This required that every 10th scan line be digitized. 
Also, each scan line contains 220 ground resolution cells . The scanner mirror 
rotates at constant angular rate whereas the digitizing was done at constant 
linear rate . This, plus the effect of topographic relief, changes the size and 
shape of the ground resolution cell from the midpoint to both ends of the scan 
line . Even so, the average dimensions of the ground resolution cell are approx- 
imately 20 by 20 feet. There is a gap of about 20 feet between cells along each 
scan line, making each cell effectively 20 by 40 feet. 

The analog data were quantized to 8-bit accuracy. Therefore, each resolu- 
tion element of each spectral band has one of 256 possible values. 

A computer printout of the data from any given channel is made to simulate 
the analog video display by breaking the continuous tones of the gray scale into 
a finite number of discrete gray levels by assigning a letter or symbol to each 
level in accordance with the relative amount of ink each symbol imprints onto 
the paper. An example is given in Figure 22. Each of the 15 reflective and 2 
thermal channels could be printed as video and (or) digital printout images, 
constituting 17-channel multiband imagery (for example, see Lowe, 1968 , figs. 

12a and 12b, p. 94 and 95)* The area coordinates are fed to a computer system*, 
which then computes the statistical parameters of each class of material. These 
statistics are calculated from the relative response in each channel (Figures 
23 and 24). Relative response can be considered as an uncalibrated reflectance 
measurement, where the lack of calibration between channels allows only relative 
comparisons of the various classes of materials within each channel. The statis- 
tical parameters calculated are based on an assumed Gaussian distribution of the 
data, and include the mean, standard deviation, covariance, and divergence (i.e., 
the statistical measure of the separability of classes). These statistics are 
stored by the conputer, and are used to represent the multispectral characteris- 
tics of each designated class of material. These statistics constitute the 


*An IBM 360 model 44 computer with 64 k bytes (8 bits per byte) of core storage 
was used. The principal conputer language used was FORTRAN, with ASSEMBLY used 
for some of the support programs. 
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multispectral pattern or "finger-print" of each terrain class, and are used in 
the computer program to l) determine which channels are most useful for recogni- 
tion of all object classes studied, and 2) actually classify the unknown data 
points into the known classes using a Gaussian maximum- likelihood decision 
scheme . 

Four channels were used in the Purdue study. This decision was based on 
experience at LARS-Purdue which has shown that the use of only 4 of the 12 
channels in the 0.4 to 1.0^ m range results in approximately as good a classifi- 
cation as does the use of more channels . Computer time, which increases in a 
geometric fashion with the number of channels used in the classification, is 
costly; therefore, some optimum for the number of channels used, the quality of 
results, and funds expended must be achieved. The channels selected are shown 
in Table II . 

The channel-selection part of the computer program provides the capability 
of measuring the degree of separability of Gaussianly-distributed classes and 
determining the optimum set of channels for doing so. This is done by calculating 
the statistical distance in N-dimensional space between the classes, N being 12 
in this case. 

The classification part of the computer program involves the actual classi- 
fication (mapping) of an arbitrary number of classes using an arbitrary number 
of channels and a Gaussian maximum-likelihood scheme . The display part of the 
program displays the results in line -printer form, and analyzes the recognition 
performance in each training area. 

A thresholding capability is provided in the display process. If the re- 
solution element does not exceed a predetermined threshold — that is, if the 
element does not look sufficiently like a member of the class to which it has 
tentatively been assigned even though that is the most likely class — then final 
classification of that element is declined and that element is assigned to a 
null class (rejected) and displayed as a blank. Different thresholds may be 
assigned to each of the classes individually. 

A segment of the digital computer terrain map is shown in Figure 25. The 
part shown is composed of about 57>86o data points — about 22 percent of the full 
map. The full map covers an area of about 2 by 6 miles and is composed of 
2 $ 9, 060 data points. 

Initial processing disclosed that at least 13 classes could be separated. 
Several of these were subunits which have been combined to make the displays 
shown in Figure 25. 

Hand-colored maps give a more graphical portrayal of the distribution of 
classes, but could not be reproduced here. 
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Table II . Channels used in the terrain classification and mapping, and to 
simulate the ERTS data channels using Purdue University's computer programs . 


Wavelength band Color or Michigan scanner 

used spectral region channel number 



0.44-0.46 Blue 2 

Best 4 .62- .66 Orange 9 

channels .66- .72 Red 10 

.80-1.0 Infrared 12 


.66- .72 Red - — 10 

.80-1.0 Infrared— 12 

2.0 -2.6 Infrared — 


8.0 -14 Thermal infrared 

ERTS scanner channels: 


0.5-0.6>t<m . 52 - .55 

.6- .7^m .62- .66 

.7- .8x/ra — .72- .8 

,8-1.2/frm .8 -1.0 

ERTS RBV cameras : 


Green 6 

Orange 9 

Infrared 11 

Infrared 12 


0-535 m P eak .52- .55 Green — 6 

.680 yt* m " .66- .72 Red - 10 

.760 m m " .72- .8 Infrared 11 

Thermal overlay 

Another aspect of the work underway is a terrain classification made by 
substituting one or more data channels from the infrared scanners (l.0-l4^m) 
for those of the 12-channel scanner (0.4-1. O^m) . 

For this test, channels 1, 3, 5# 7> 9> 10, 11, and 12 of the 12 -channel 
scanner were combined with the 1.0-1.4^m, 2.0-2.6^m, 4.5-5.5^m, and 8-l4^m 
channels. A computer program recently developed at LARS-Purdue made it possible 
to overlay the data from these two separate scanner systems. The computer 
selected the best set of four of these channels (Table II) for classification of 
the terrain in the same manner as before. The maximum mismatch of registry is 
no more than three ground resolution cells, and probably is mostly no more than 
one cell. 

The "map" of Figure 26 is the result of overlaying one thermal and three 
reflective channels ( 0.66-0.72, 0.80-1.0, 2. 0-2. 6, and 8 - 14 ^ 111 ). Only one of 
these channels is in the visible range. Because the scan angle of the thermal 
scanner was much narrower than that of the reflective scanner, this display 
covers only the middle east-west strips of those shown in Figure 25. The close 
correspondence of this display with the others indicates the accuracy of class- 
ification. 
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These studies should enable us to further extend the range of potential 
diagnostic spectra for existing classes and may point out some additional terrain 
classes . In addition, they will be useful tests of how well computer programs 
can take data from different scanner systems and automatically overlay them to 
produce a single set of multispectral data. 

Simulation of ERTS data channels 


Along with the studies of evaluating the accuracy of performance, we are 
studying how well data in wavelength bands tentatively designated for the proposed 
Earth Resources Technology Satellite (ERTS) might serve for automatic mapping of 
the same eight terrain classes in the same area. 

The existing computer programs of Purdue did not allow a simulation of the 
wider wavelength band width of the ERTS sensors. Instead, the midpoints of the 
channels of the proposed ERTS 4-channel scanner, and the peak transmissions of 
the three Return Beam Vidicom (RBV) cameras were matched with the closest channels 
of the University of Michigan 12-channel scanner. These data are summarized in 
Table II. 

The classification using the simulated ERTS 3-BBV cameras is shown in Figure 
27. Note the close agreement with Figure 25 — that based on the computer-selected 
best set of four channels. A segment of the display of the simulated ERTS 4- 
channel scanner data classification is shown in Figure 28, for comparison with 
the RBV camera simulation and the computer-selected 4-channel display (Figures 
25 and 27). 

Accuracy 

In general, the products are highly satisfactory terrain maps which portray 
PHYSIOGRAPHIC UNITS or units which are unique associations of ROCK-SOIL-VEGETATION 

The following generalizations about accuracy of classification of the terrai 
classes is based on comparison of the computer-generated maps with the ground con- 
trol data. 

1. BEDROCK EXPOSURES 

This class is present mainly in the western part of the test area, along 
the banks of the Yellowstone River, and in a quarry where it was moderately 
well classified. Where mis identified, it generally was classified as 
vegetated rock rubble — a closely similar unit into which it grades. 

2. TALUS 

All of the known areas of this class and a few previously undetected are 
clearly delineated. 

3. VEGETATED ROCK RUBBLE 

The general areas classified are realistic, but in detail this class is the 
least well classified. Because of the small size of the individual areas 
occupied by this unit, it is not possible to locate precisely a homogeneous 
training area. 
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4. GLACIAL KAME MEADOW 

Areas of kame meadows are accurately depicted. Areas of forested kame sand 
and gravel between open meadows of kame were erratically classified by the 
computer, mostly as other units. Control data show that in some places this 
class occurs as small scattered patches surrounded by till; in those places 
it was mis identified by the computer. 

5. GLACIAL TILL MEADOW 

This class was first classified as four separate subunits on the basis of 
change in illumination across the flight path, but the four were later 
combined into one unit for the map printout. Classification is estimated 
as about 95 percent accurate over the entire flight strip. The other 
classification symbols scattered throughout areas of this class generally 
are correct, for there are small areas of vegetated rubble and of bogs in 
meadow areas underlain by till. 

Although both the till and kame deposits are the sites of meadows, the 
differences in amount of soil exposed and the subtle differences in soil 
composition and texture apparently permit these two classes to be accurate- 
ly distinguished by the computer. 

6 . FOREST 

This class generally is well recognized in large, almost uniformly-symboled 
blocks. All forest areas seem to be consistently recognized. Local 
clusters of deciduous trees were recognized separately, but combined with 
evergreens in the displays . 

7. BOG 

This is one of the best recognized classes. All known bogs and many pre- 
viously unknown small bogs were correctly mapped. 

8. WATER 

The Yellowstone River and Floating Island Lake (see Figure 5) were clearly 
recognized. Phantom Lake (not on this segment of map) was dry at the time 
of flight, and so was correctly classified as bog rather than water. Parts 
of the Yellowstone River were omitted or generalized, principally because 
the width of the river is near the threshold of resolution, and because 
some data points were integrated values of river plus some other class or 
classes. Stretches of white-water rapids were rejected. In places, the 
shaded north edges of patches of forest were printed as scattered points 
of water or talus. 

9 . SHADOWS 

All shadows were recognized well. 

10. OTHER 

All data points whose reflectance did not closely fit the statistical data 
for any of the above nine classes were rejected, and shown as blank regions 
on the map. A few of these are very light and bright areas of shallow 
water where bottom deposits show through, or are white -water rapids and 
gravel bars . 
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A blacktop road can be detected in places as a line of anomalous symbols, 
but is not consistently recognized as any particular class. The road is about 
as wide as a single data point and hence is at the threshold of resolution. 

Although all bedrock types were classified as a single unit, the spectral 
reflectance histograms, spectrograms, and the divergence data indicate good 
possibility of distinguishing among several of the rock types present. Further 
testing over areas of larger rock exposure seems justified. 

Where terrain classes covered large areas, they were correctly identified 
by the computer. Most inaccuracies occurred where the units were small and where 
some were below the threshold of resolution; accordingly, the radiance for a 
given resolution cell was a complex combination of several classes. Presumably, 
the computer usually selected the dominant terrain class or, by thresholding, 
indicated that the spectral properties did not clearly fit any of the classes. 

For comparison of the performance of classification using the ERTS simula- 
tions with the best sets of four channels, the computer rated itself in the 
training areas only. For example, of the total of 5>4l8 data points used in 
training the computer, less than 20 of those were subsequently classified (using 
the best set of four channels) as something other than what they were called 
during the training. The ratings are as follows: 


Best set of four channels 99*8 percent 

Thermal overlay 98.8 

ERTS 4-channel scanner 97*7 

ERTS 3 RBV 93.8 


The figures are a good measure of the relative accuracy of each test. They 
are misleading in part because the computer assumes that each training area is 
homogeneous, consisting 100 percent of what it was labeled. The 0.4 percent 
error probably is a close measure of the degree of inhomogeneity of the material 
in the training areas. 

When coordinates of other known areas (test areas) are fed to the computer, 
the computer determines the classification of those areas and computes the accu- 
racy of classification. Appraisal of numerous test areas gives a more complete 
and meaningful evaluation of the overall recognition performance of the computer 
program. 

Preliminary results of computer studies which rate the accuracy of classi- 
fication of test areas give the following overall performance (data from unpub- 
lished report by Marc G. Tanguay) : 


Best set of four channels 86 percent 

Simulated ERTS four channels 83 

Simulated ERTS 3 -RBV cameras 82 

Thermal overlay 8l 
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These figures should be taken as approximations only. They agree with a 
preliminary visual estimate that the overall accuracy of all displays is more 
than 80 percent, and indicate that the best set of four channels gives slightly 
better results than the other three displays, all of which are about equally 
good. 


The drop in accuracy from 99 to 86 percent, etc., from the training to the 
test areas, is understandable, because we would expect the computer to perform 
well in the areas where it was trained unless the reflectances of two or more 
classes were closely similar in all channels used. 

For the training areas, the classification made using the overlay of thermal 
and reflective channels was virtually as accurate as the best classification — 
that using the computer-selected best set of four reflective channels (98-8 vs 
99*6 percent, respectively). However, for the test areas, the thermal overlay 
was least accurate (about 8l vs 86 percent). The slight mismatch of registry in 
parts of the thermal overlay test undoubtedly results in a less accurate classi- 
fication than if all channels were in complete registry, as would occur if a 
single scanner system could cover the range of 0.4 to l4/ym or more. 

Nevertheless, these studies indicate that the infrared region is promising 
in the classification of some terrain units. For example, in the test areas the 
thermal overlay classification was better than the computer-selected best four- 
channel classification for glacial till (95 vs 93 percent), glacial kame (82 vs 
74 percent), and bog (8l vs 80 percent). The accuracy of classification of talus 
in the test areas was only about 49 percent; however, most of the error was due 
to talus being misclassified as vegetated rock rubble, a unit which actually is 
quite similar to talus. If talus and rock rubble are combined as a single unit, 
the accuracy jumps to about 83 percent, whereas the same combination was classi- 
fied only about 76 percent when using the best set of four channels in the test 
areas . 


In geologic applications it is more desirable to know what kind of material 
the forest is growing on than simply to know where the forest is. The thermal 
overlay classification has some potential in this regard; it has been shown 
(Waldrop, 1969) that thermal infrared in forested areas can in places indicate 
the sites of thick, unconsolidated, well-drained gravels vs bare or thinly man- 
tled bedrock. 

An obvious advantage of infrared data channels for space applications is 
the haze penetration ability. Further investigations are needed to adequately 
assess the potential of these channels, particularly over areas of extensive 
rock outcrops . 

Studies underway also include careful evaluation of the overall accuracy by 
point-to-point comparison with ground -truth maps. It is important to recall the 
recognition of previously undetected areas of occurrence of some terrain units . 
This means that errors in the control maps are being detected at the same time 
errors in the computer printout are being sought. 
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In general, the ERTS simulations differed from the computer-selected best 
four channels as follows : 

1. For areas correctly shown as FOREST on the classification using the best 
four channels, the ERTS 4-channel classification showed small to moderate 
amounts of TALUS and WATER, whereas the RBV 3-channel classification showed 
greater amounts . 

2. In places, both ERTS classifications showed considerably more BOGS than are 
present in areas that were correctly classified by the best four channels. 

3. Slightly poorer classification of water was performed in the ERTS classifi- 
cation. However, few of the bodies of areas of water in the test area are 
of sufficient size to serve as good training areas, so I do not view this 
part of the classification as a good test of the ability of the ERTS data 
channels to permit automatic identification of water. 

I wish to point out that these are not complete simulations of the ERTS 
data channels, but are only first approximations, because no attempt was made to 
simulate l) the poorer resolution of the satellite sensors due to vast differences 
in scale, 2) the effects of atmospheric attenuation, or 3) the broader wavelength 
bands of most of the ERTS sensors (see Table II). Studies at the University of 
Michigan aimed at more closely simulating the actual wavelength bands of the ERTS 
sensors, are described in a later section of this report. 

All four of the experiments produced good results . They are good classifi - 
cations . I do not wish to set any specific limits on how good "good 1 ' is . Obvi- 
ously, some are better than others, and none is perfect — but neither is the man- 
made control map — and preprocessing of the data, as described below, further im- 
proves the accuracy of the computer-generated maps. I am convinced, however, 
that all can be considered as more than adequate for the reconnaissance first- 
approximation kind of interpreting and mapping which we expect to accomplish with 
the satellite data. 

TESTS OF PREPROCESSING OF DATA 

Programs for terrain classification using multispectral data, such as des- 
cribed above, are based on the assumption that the spectral radiance of all ob- 
jects of a given class in the scene is substantially the same. During these 
studies we soon determined that this was not true because of various factors such 
as cloud shadows, haze, variations in topography, and changes in scanner "look" 
angle. As a result, the performance of classification schemes based on spectral 
signatures was degraded, both in analog and digital tests. This was partly over- 
come by selecting multiple training sets for each class, each set effective over 
a narrow range of data along the scan line . However, this lengthened the pre- 
paration and computer-processing time. This is described more fully in a paper 
by Smedes, Spencer, and Thomson (1971 )• 
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Preprocessing of the data to compensate for the angular variations was 
studied using digital computer programs developed by the University of Michigan. 
This allowed the use of a single training set for each class of objects anywhere 
in the field of view. It shortened the preparation and computer -processing time, 
gave more accurate results, and, in places, enabled areas under cloud shadow to 
be classified as to the correct terrain class. 

In a way shadows and topography pose similar problems, in that they affect 
the total level of irradiance. North-facing slopes, and shadows, result in low- 
ering the total level, whereas south-facing slopes reflect more light back to the 
sensor and result in a raising of the total level. However, in both cases, the 
ratio of observed (recorded) spectral radiance in two spectral channels will be 
independent of variations in the level of radiance. 

The topographic and other scan-angle -dependent variations may be deduced 
and corrected for by dividing by a function of the scan angle. 

Results of preprocessing of these two types — ratio transformation and scan- 
angle function transformation- -are described below. 

Ratio Transformations 


Three ratio preprocessing techniques, previously reported by Kriegler and 
others ( 1969 )* make use of the fact that if the scene radiance varies, this var- 
iation is present in all spectral channels . Ratios of channel signals therefore 
will show less variation with scene changes than will the signals themselves. 


These ratio transformations, which have been previously used as techniques 
for preprocessing the scanner -spectrometer data to reduce illumination variations, 
are: 


1 . 


2 . 


Ratio of each channel to the sum of all channels, e.g.. 


Ratio of adjacent channels. 


channel 2 
e ■ g ■ , channel 1* 


channel 1 
T . channels * 


3 . Ratio of the difference to the sum of adjacent channels, e.g., 
channel 2 - channel 1 . 
channel 2 + channel 1 


If the original scanner-spectrometer data consist of N channels, then 
after performing any of these three ratio transformations there would be (N - l) 
independent channels of data. Therefore, one channel must be eliminated. Arbi- 
trarily, the twelfth channel was eliminated in these studies by Michigan. The 
data of Table III show that the first ratio transformation (ratio of each to the 
sum) gives lower probability of misclassification than do the other two ratio 
transformations . 
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Table III. Probabilities of misclassification of training area classes using 
different preprocessing of the original data and ERTS scanner simulations . The 
higher the value, the greater the probability of misclassification . 


Preprocessing technique 

Probability of misclassification 



Best channel 

2 Best channels 

3 Best channels 

Ratio 

transformation 1 

0.088 

0.037 

0.026 


channel n ] 

(n=l) 

(n=l, 5 ) 

(i*l,5,10) 


Y channels/ 

Original data 




ERTS RBV simulation 

0.101 

0.039 


ERTS scanner simulation 



0.035 

Ratio 

transformation 2 

0.179 

0.087 


^channel etc.j 

($ 

(S) » $ 


Ratio 

transformation 3 

0.186 

0.091 


/channel 12 - 11 . \ 

/ 12 - ll\ 

(12 ■ n l 

10 - 9\ 

\ channel 12 + 11 ') 

( 12 + llj 

\12 + 11/ 7 

lio + 9} 

Normalized scan-angle 

0.070 

0.015 

0.009 

function transformation 

(5) 

(5, 12) 

(5,12,2) 


Scan-angle Transformation 

More knowledge of the scene is required for the use of this preprocessing 
technique than for the ratio preprocessing. It is assumed that the variations in 
scene radiance in each spectral channel can be described by a function of scan 
angle, which may be determined by analyzing spectral signatures from one or more 
classes of materials distributed along the scan line. Enough information must be 
available to locate samples of a given class of material at various scan angles. 

If the ratio of spectral radiances of two classes as a function of scan 
angle is constant, then we can derive a single correcting function which we can 
assume will be valid for all data in each spectral channel for all classes. This 
assumption is equivalent to assuming that the bidirectional reflectance variations 
as a function of scan angle of different classes are the same. 
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A universal scan-angle correcting function for each channel is obtained by- 
normalizing the function of signal vs scan angle for any one material at some 
particular scan angle. 

For one spectral channel (channel 1, Table I), the variation in the means 
and standard deviations of the training areas of FOREST, BOG, and TILL as the scan 
angle is varied are shown in Figure 29. In this example, a radiance value of 0.8 
could be interpreted as any. one of the three classes depending on the scan angle. 

Figure 30 shows the same data after the scan-angle transformation. Now the 
classes are clearly separated, and a radiance value of 0.8 could only be inter- 
preted as TILL. 

If the preprocessing technique is successful, the wide separation in mean 
values of spectral signatures of classes at different scan angles (Figure 29) has 
been eliminated for training sets of the same class of material. One could easi- 
ly obtain combined signatures for each class using unpreprocessed data, as was 
done in the Purdue study. Because this results in signatures with large standard 
deviations, the signatures of different materials are far more likely to overlap, 
causing increased probabilities of misclassification. Because of this, the 
Purdue tests using non-preprocessed data required that most materials be treated 
as subunits, in terms of scan-angle position — the subunits later combined as com- 
plete units (classes) during the printout stage. 

As with the non-preprocessed data studied by Purdue, the Michigan computer 
program* utilizes a supervised training program which involves a maximum-likeli- 
hood decision scheme for selecting the best spectrometer channels, and closely 
similar techniques of digitizing the data of the analog magnetic tapes. 

Studies of the training areas of all classes enabled the computer to cal- 
culate the probability of misclassification by using the different preprocessing 
techniques described above. The results, summarized in Table III, indicate that 
use of the normalized scan-angle function transformation would result in the 
lowest probability of misclassification. Hence, that transformation was used in 
making the recognition map of Figure 31- The symbols used to designate the dif- 
ferent terrain classes are shown in Table IV. 

Simulation of ERTS Data Channels 


The existing Michigan computer programs allowed a closer simulation of the 
full band width and spectral response of the ERTS sensors. 


* A CDC-I^OU computer with 32K bytes (8 bits per byte) of core storage was used. 
The computer language used was FORTRAN TV. An IBM-lUOl computer was used for 
peripheral work and for the actual printing of the recognition of maps in color. 


3G4 


Table IV. Symbols and colors used to represent the different terrain classes 
shown on the maps of Figures 28, 31» and 32 . 


Material 

Color 

Blue 

Red 

Green 

Black 

White 

Bog 



* 



Forest 



$ 



Glacial Till Meadow 


* 




Glacial Kame Meadow 


* 




Bedrock 

* 





Vegetated Rock Rubble 

* 





Water 




$ 


Talus 




* 


Shadow 




• 


Not classified 





(No Symbol) 


The spectral responses* of the ERTS 4-channel scanner and the three RBV 
cameras were used to construct nominal spectral sensitivity curves. The detailed 
spectral response of each scanner-spectrometer channel (Larsen and Hasell, 1968) 
were fitted graphically to the specified ERTS data using a technique described by 
Nalepka (1970). Further corrections to account for peak sensitivity variations 
in photomultipliers were determined from radiance standard lamp data. The result 
was a set of weighting coefficients to be assigned to each spectrometer channel 
to simulate the ERTS sensor data. These are summarized in Table V. 

These data were used with the signatures previously obtained from the train- 
ing areas, using preprocessing with the ratio transformation in which the spectral 
channel is divided by the sum of spectral channels, using channels 1, and 10 
(see Tables I and IV). The terrain map made using this data is shown in Figure 
32. 


* Data obtained by F. Thomson and M. Spencer of the University of Michigan 
from L. Goldberg and 0. Weinstein of NASA -Goddard. 
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Table V. Weighting factors 

for simulation of ERTS sensors 

with original scanner 

spectrometer data. 

Michigan spectrometer 

ERTS 4-channel scanner 

Weighting of 

channel number 

and RBV camera 

Michigan channels 

5 


0.68 

6 

channel 1 

.86 

7 


• 96 

8 


• 70 

8 


.74 

9 

channel 2 

.96 

10 


• 72 

11 

channel 3 

1.00 

12 

channel 4 

1.00 

4 


.72 

5 

camera 1 

.82 

6 


• 95 

7 


.86 

9 

camera 2 

.94 

10 


• 59 

11 

camera 3 

1.00 


The probability of misclassif ication of training area classes using simulated 
ERTS scanner data is 0.035 (Table III) and the average percentage of correct re- 
cognition is 8l (Table Vi). Although no recognition map was run for the simulated 
ERTS 3 RBV cameras, the data from the training areas indicate probability of mis- 
classif ication as 0.039 using the same ratio transformation. If the simulations 
had been made with normalized scan-angle transformation, the probability of mis- 
classif ication surely would have decreased, and the accuracy of recognition 
would probably have increased to about 90 percent (deduced from data of Table VI ). 

Accuracy 

The accuracy of classification using preprocessing techniques of the original 
spectrometer data and the simulated ERTS scanner data is indicated in part by the 
data of Table VI, which is based only on the training areas. 

The ERTS simulations were made using the best of the three previously tested 
ratio processing techniques before Margaret Spencer had developed and tested her 
program for normalized scan-angle preprocessing. Had the scan-angle preprocessing 
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been used, it would have permitted use of all four ERTS channels rather than the 
three which resulted from the (N - l) loss inherent in all ratio preprocessing 
techniques, as described above. 


Table VI . Accuracy of recognition in training areas using different preprocessing 
of original spectrometer data and ERTS scanner simulations (maps of Figures 28, 

31» and 32) . 



Results of training area evaluation of the 
recognition map, in percent correct recognition 

Catagory 

Normalized 

scan-angle 

transformation 

Ratio 

transformation 

/channel n \ 
\Z channels/ 

ERTS scanner 
simulation using 

/channel n'\* 

( E channels/ 

Bedrock 

96.7 

80.0 

47.7 

Talus 

91-9 

81.0 

63.1 

Vegetated Rock Rubble 

94. 4 

91-9 

95-3 

Glacial Kame Meadow 

100 

93-3 

94.7 

Glacial Till Meadow 

98.6 

74.4 

89 .O 

Forest 

97.0 

88.8 

85.5 

Bog 

95.0 

92.7 

88.9 

Water 

94.7 

94.7 

93.4 

Shadows 

97.4 

91.8 

71.3 

Average percentage of 

correct recognition 

96.2 

87.7 

81.0 

Average percentage of 

incorrect recognition 

3-4 

11.6 

18.4 

Average percentage of 

no recognition 

0.4 

0.7 

0.6 


* n' indicates the simulated ERTS channel which is a non-linearly weighted summa- 
tion of several original spectrometer channels (see Table V) and does not corres- 
pond to n which represents the non-weighted original spectrometer channel. 


The simulated ERTS data gave poor results for three classes — bedrock, talus, 
and shadows. However, bedrock and talus basically are exposures of rock blocks, 
the blocks in talus simply having moved down slope a short distance, and the two 
units actually are gradational. If these two gradational classes are combined, 
the recognition performance would rise to values comparable to those of other 
classes, for nearly 18 percent of the bedrock was misidentified as talus and 
nearly 28 percent of the talus was misidentified as bedrock. The apparently poor 
performance of the ERTS data for shadows (71-3 percent) is misleading because 
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10.4 percent of the training area was identified as forest, which is the true 
terrain unit that was under the shadow. This suggests that, somehow, the simu- 
lated ERTS data can "see" through the shadow better than can the other recogni- 
tion techniques. 

As with the non-preprocessed data, the most meaningful test of accuracy is 
not in the areas for which the computer was trained, but throughout the entire 
map, through use of numerous test areas or a point by point comparison with the 
ground control data. 

At the time of this writing, studies are underway to determine more closely 
the accuracy of these preprocessed classification maps. Preliminary results in- 
dicate that the most important improvement in accuracy has resulted from the fact 
that, because of preprocessing, the area of shadow was markedly reduced and the 
true terrain unit beneath the shadow accurately mapped. Because the terrain class 
generally under cloud shadow was forest, the classification of forest was greatly 
improved over that of the non-preprocessed maps. 

In comparing the accuracy of these maps made using preprocessing techniques 
with those made without preprocessing, the following generalizations are valid: 
bog, glacial kame meadow, water, and vegetated rock rubble were as accurate or 
slightly more accurate; forest, glacial till meadow, bedrock, and talus were 
definitely more accurate. 

Thus, in spite of far fewer training areas (24 vs 187 ) and fewer channels 
used (3 vs 4), these maps are more accurate and required less computer time than 
those made without preprocessing. 

NON-SUPERVISED PROGRAMS 

The various techniques described above required that we train the computer 
on known areas. This requires some prior knowledge of the region. The techniques 
described below are based on non-supervised processing that requires no prior know- 
ledge of the area. These techniques utilize the fact that the radiance of differ- 
ent classes tends to cluster in different places in n-dimensional space. The pro- 
grams allow the computer to determine these clusters and to plot each class based 
on clustering, whatever the class may be. 

One such natural class might be printed in map form by the letter S, for 
example; and, although you would not know what that class really was, you would 
know everywhere it occurred. Limited field checking or photo interpretation 
would then give an identity to each of the classes mapped by this clustering 
technique. 

In addition to the fact that no prior knowledge is required, there is the 
further advantage that no calibration is required. 

Non-supervised digital computer processing of data using cluster techniques 
has been done using the multispectral scanner data and color aerial film. 
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SCANNER DATA 

In order to reduce computer time, the original twelve channels of data were 
preprocessed by a principal-components analysis to yield four different classes, 
but with almost all (more than 99 percent) of the statistical and spatial struc- 
ture preserved. The cell size was reduced by taking every third row of resolu- 
tion cells and every third resolution cell on each such row taken. This is the 
equivalent of a cell 60 by 60 feet. Each of the four classes consists of a 
binary vector of 25 components. 

From this principal -components analysis, clustering and inverse clustering 
functions (described briefly below) were generated for a sample of 1,908 data 
elements out of 30,044. 

Any clustering procedure assigns data elements to clusters such that the 
elements within a cluster are closely similar or related, and elements of any two 
clusters are dissimilar or unassociated. 

The clustering technique used at the University of Kansas is an iterative 
clustering procedure (Haralick and Dinstein, 1970) programmed for the GE 635 
computer . 

Viewed as a geometric approach, the data elements are represented as vectors 
of N binary congponents, each of which has a value of +1 or -1. Hence, each data 
element can be considered as a vertex of an N-dimensional unit hypercube in an 
N-dimensional Euclidian space. The clustering function f projects those points 
into vertices of a K-dimensional unit hypercube. The points that are projected 
into a vertex of the K-dimensional hypercube form a cluster whose code is the 
coordinates of the vertex in the K-dimensional space. The inverse clustering 
function g projects these vertices of the K-dimensional hypercube back into some 
vertices of the N-dimensional hypercube. 

The computer program enables the K-dimensional hypercube to be iteratively 
"rotated" until the sum of the distances between the original and respective 
projected-back data points is minimum. 

The f and g functions are chosen from a class of parametric functions de- 
rived from linear threshold functions. "Rotation" of the K-dimensional hyper- 
cube is accomplished by iterative small changes in these parameters. 

The cluster map derived from this iterative clustering technique by digital 
computer contained symbols delimiting four classes. These classes were outlined, 
traced onto a blank sheet and coded with drafting patterns. This final product 
is shown in Figure 33. The classes correspond generally with the following: 

Black = Rock 

Dark gray = Till on south-facing slopes (low elevation) and 

local patches of mixed vegetated rock rubble, forest, 
and rock on south-facing slopes (low elevations) 
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Medium gray = Till, kame, locally vegetated rock rubble 
White = Forest, water, and shadow 

The topographic and scan-angle functions have obviously degraded the class- 
ification as indicated by the different patterns for materials along the north 
edge (south-facing slopes at low elevation). Preprocessing by ratio or scan- 
angle function transformations undoubtedly would improve the accuracy. 

COLOR AERIAL FILM 


Traditionally, multispectral data from film have been from multilens cameras 
using black and white film, each lens filtered to pass only a limited wavelength 
band. In the present study, conducted with EG&G, Inc., the three emulsion layers 
of color infrared film were used as a three -band spectrometer. This film was ac- 
quired at the same time as was the scanner data. Because of the focal length of 
the cameras, the film covers only the central part of the full scan width. 

This study is underway at the time of this writing, so only preliminary 
results can be presented in this report. 

The three emulsions of color infrared film are most sensitive to the broad 
spectral bands (approximate ) listed below, as normally exposed with appropriate 
filters : 


Emulsion Color recorded Band width 


Equivalent spectrometer 
channel numbers 


blue green 

green red 

red infrared 


0.50-0. 58 jum 
.58- .68^ 
.68- .90 


5,6,7 

8,9, ( and half of 10) 
10,11, (and half of 12) 


Image density data from each of the three color emulsions were entered into a 
digital computer program that produces terrain maps by using clustering techniques 

The 70-mm color transparencies were scanned by a Mann Trichromatic Micro- 
densitometer at a resolution of U50 , corresponding to a spot about 30 feet in 
diameter on the ground. All three color film layers were sampled simultaneously 
by means of beam-splitters and filters. Triads of density values were obtained 
from each area element on the transparency, as shown diagrammatically in Figure 
34, and were recorded on magnetic tape. The data were digitized to nine-bit 
accuracy (though less accuracy would suffice). Characteristic spectral signa- 
tures inherent in the data — presumably representing terrain classes — were iden- 
tified by the application of clustering techniques in three-color space. Each 
datum point was assigned on the basis of its spectral signature, to one of these 
classes, and each class was then assigned a letter character (for example, A and 
B, Figure 34). Computer -generated overlays were made to fit photographic en- 
largements of the transparency at a scale of approximately 1:4,000 (Figure 35). 
Existing control data enabled these classes to be labeled as to true terrain class 
This technique is described more fully in a paper by Smedes, Linnerud, and Hawks 
(1971). 
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The following terrain classes were automatically mapped from color infra- 
red film, with overall accuracy equal to or better than those of previous multi- 
spectral scanner classification (better than 85 percent): l) deciduous trees, 

bushes, and bogs, 2) evergreen trees, 3) bedrock, largely granitic gneiss, 
and rhyolite tuff (in a quarry), 4) bedrock, largely basalt and amphibolite, 

5) talus, 6) rock and talus in shade, and 7) shade of trees and cliffs. 

The resolution cell size of the present study closely simulates the computer- 
generated maps previously made from multispectral scanner data. Studies underway 
include classification of the same areas using smaller ground area elements. 

The present study indicates that color film can be used as an accurate means 
of multispectral terrain mapping by computer. An important additional advantage 
is that the results can be directly overlaid on the photograph. Geometric dis- 
tortions can be rectified by using stereoscopic pairs of aerial photographs and 
simple standard photogramme trie techniques. Color aerial photographs are readily 
available at low cost, whereas scanner data are sparse and sire expensive to obtain. 

The widths of the bands used in this study are similar to those of the ERTS 
sensors. The accuracy of classification (better than 85 percent) indicates how 
well the ERTS sensors might be expected to perform in classifying similar terrain, 
and agrees closely with the results of ERTS simulations made using scanner data. 

CONCLUS IONS AMD RECOMMENDATIONS 


Direct comparisons cannot be made between any of the computer maps and the 
control data for several reasons. With the exception of the test with the color 
aerial photos, it is not possible to transfer the class boundaries accurately 
from the scanner format to the topographic base map or to annotated aerial photos 
on which the control data are plotted. Scanner distortions and size of the com- 
puter ground resolution cell place constraints on the accuracy of locating these 
boundaries. Continuing research and development at the Universities of Michigan, 
Kansas, and Purdue are minimizing these constraints. 

The control data consist of a surficial geologic map, a bedrock geologic map 
(both of which ignore the vegetation), and a map in which percentage of classes 
within broader areas are indicated (Figure 36). Thus, because we cannot accurate- 
ly locate and check accuracy of mapping of specific small clumps of trees or small 
and scattered outcrops, we can only compare percentages of these classes within 
the broader areas on the computer map with those on the generalized control map. 
But, for the purposes of these tests, we are really only interested in the larger 
features, anyway — those that constitute a large resolution cell comparable to 
that which can be resolved by the satellite sensors. 

Furthermore, a direct comparison cannot be made of the results obtained 
using different computer techniques because of the differences in digitizing the 
data, number and size of training areas used, and resulting differences as sum- 
marized below. However, the objectives of the Michigan study were principally 
to test the effectiveness of preprocessing and of more closely simulating the 
ERTS data channels. 
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Purdue Michigan 

Total number data points 269,060 187,131 
Number of training areas 187 24 
Total number of data points in training area 5,^18 3,521 
Percent of total area used for training 2$ 1.9$ 
Maximum number of channels used in classification 4 3 


The University of Kansas cluster processing used only every third row of 
cells and every third cell of such a row, resulting in a ground resolution cell 
of about 60 X 60 feet in contrast to the cells of 20 X 20 feet and 30 X 30 feet 
used in the other tests. However, in this regard, it more closely simulates the 
resolution of the satellite sensors. 

In spite of this inability to be able to make a point by point (cell by 
cell) comparison of the computer maps with the control data, the overall accur- 
acies can be rather well determined, as indicated in previous chapters. 

One of the overwhelming problems of automatic mapping of terrain classes is 
that the spectral signatures of a given class vary widely with such things as 
time of day, season of the year, latitude and flight direction (illumination- 
angle functions), and recency of rain in the region. I doubt that we can ade- 
quately sample the spectra of many classes of material under a wide enough range 
of conditions so that by the spectra alone we could identify materials. 

Lacking complete samples of spectral signatures, we would have to rely on 
supervised computer techniques, thus requiring some prior knowledge of the terrain. 

However, after studying the results of these tests, and on the basis of 
discussions with Dr. Robert Haralick of the University of Kansas, I am convinced 
that there is great potential in the cluster processing technique, and that auto- 
matic mapping will be most successful where the two kinds of processing — non- 
supervised and supervised — are performed in concert. Specifically, I suggest 
that the (non-supervised) cluster techniques be used to determine what the nat- 
ural terrain classes are and where they are. Then, sample spectra of these 
classes can be compared to spectral signatures in a master computer data bank 
to determine which class or classes of material most closely correspond to the 
spectra from these natural terrain classes. Although several specific classes 
may be equally likely on the basis of the data bank, several may be eliminated 
by mutual exclusiveness with adjoining classes, by reference to model studies 
such as described by Watson at this Workshop, or by some very general knowledge 
of the region surveyed. It may thus be possible to identify specific natural 
classes of terrain without field check. 

Preprocessing of the data is an important part of the concept because 
it normalizes the data and thus minimizes the effects of variations in illumi- 
nation due to scanner "look" angle, topography, and shadows. Subsequent sets 
of data over the same area could be processed using conventional supervised 
techniques . 
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If we exa m ine the spectral range spanned for each of the displays (Tables 
H, V, and Figure 37 ), we see that they vary by a factor of nearly 50, from 
0.28^m for the 3-camera ERTS system to 13.34/fm for the thermal overlay classi- 
fication. This implies that, for a broad range of terrain categories, many com- 
binations of three or four channels of data in the 0.4-14/ym range would be satis- 
factory. More complete simulations, in which the effects of the atmosphere are 
considered, undoubtedly will require identification as to which channels would be 
suitable. For example, the haze penetration ability of some reflective infrared 
channels, mentioned earlier, is an obvious advantage for satellite applications; 
the blue part of the spectrum is apt to have low signal -to-noise ratio and there- 
fore be of limited use except for oceanography. We need worry about careful se- 
lection of specific wavelength bands only if a specific category is being sought. 
Inasmuch as the ERTS program is aimed at covering many scientific disciplines and 
user groups — hence involving many terrain categories — the highly specific require- 
ments are not now pertinent to tests of the suitability of the proposed satellite 
sensors . 

I believe that the concept, rather than the specific immediate results of 
these studies, is the most inportant product. Admittedly, it is not really im- 
portant to find that talus occurs on the shore of a lake here or that a narrow 
bog lies there — we already know most of that for this particular area. The im- 
portant point is that eight or more widely different terrain units could be 
accurately mapped, automatically. For the moment it does not really matter what 
the units are or where they occur --they could as easily have been orchards, barns, 
municipal parks surrounded by streets and buildings, beaches, polluted or clean 
water, marshlands, etc. 

In fact, I believe that these particular maps are over - classified in com- 
parison with what we will want to attempt from space — at least for our first at- 
tempts. It may well suffice to map out such features as WATER, VEGETATION, BARE 
SOIL, and ROCKS, and to interpret other things, such as geologic structure, from 
the resulting patterns and their relations to topography. 

Especially significant applications in geology and other fields will be for 
those features that are time -dependent, changing with the seasons or with a few 
years' time. Once an area has been mapped by computer, the areas of change can 
be periodically mapped automatically in terms of material, location, and the 
amount of area changed 

I suggest that economically feasible geologic applications will include those 
that contribute to regional mapping, engineering geology, hydrology, and volcan- 
ology. Other applications may be in the fields of agriculture, cartography, land- 
use and land-management studies, and in still other fields in which seasonal and 
other changes are more rapid than in most geologic applications. In many fields, 
these data will become more useful by combining them with other (nonspectral) 
data- -for example, the engineering application to trafficability studies — by com- 
bining these terrain data with slope (from radar images or topographic maps). 
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The fact that we are sensing surface material emphasizes the need for 
multidisciplinary approach to terrain mapping, because the surface involves the 
complex interplay of at least bedrock and surficial geology, hydrology, soils, 
vegetation, and meteorology. Traditionally, in mapping many regions of the 
earth, we interpret the geology secondarily from the patterns of other materials 
and features. 

I hope that, in this brief review of the steps involved in acquiring and 
processing the data, you can see in the results some applications to your own 
fields of interest. 
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GLOSSARY OF SPECIAL TERMS USED 

Basalt lava . Fine-grained, dark colored lavas relatively rich in calcium, iron, 
and magnesium, and low in silica. 

Breccia (volcanic). A rock formed of compacted volcanic fragments embedded in 
a tuff matrix. 

Cathode ray tube . A vacuum tube that generates a focused beam of electrons which 
can be deflected by electric and/or magnetic fields . The terminus of the 
beam is visible as a spot or line of luminescence caused by its impinging 
on a sensitized screen at one end of the tube. These tubes are used to re- 
produce pictures in television receivers or to study the shapes of electric 
waves . 

Control data . This refers to all that is known about the site conditions, in- 
cluding types and distribution of materials (determined from conventional 
field mapping and examination supplemented by study of photographs taken 
from the air and ground), and measurements of such parameters as tempera- 
ture, relative humidity, porosity, moisture content, and spectral reflec- 
tance of surface materials. Collectively, these constitute the control 
data with which the test data can be compared. 

Covariance . The manner in which one feature or parameter varies in relation to 
another . 

ERTS . Acronym for the Earth Resources Technology Satellite. 

Gneiss . A coarse-grained metamorphic rock in which bands rich in granular min- 
erals alternate with bands in which platy minerals predominate. These are 
derived from pre-existing impure sandstone, shale, or granite during the 
dynamic and thermal processes involved in mountain-building. 

Kame . A mound composed chiefly of gravel and sand, whose form is the result of 
original deposition by settling during the melting of glacier ice against 
or upon which the sediment accumulated. 

Maximum-likelihood . The maximum probability or chance. The statistical para- 
meters of radiance may somewhat resemble those of two (or more) different 
classes of material. The relative likelihood (probability or "odds") that 
the data in question belong to class A as opposed to class B or C is 
P(A)/P(B) or P(A)/P(C). When the statistical parameters are chosen so that 
these ratios are optimum, then the likelihood is maximum and the data are 
assigned to that class (A,B, or C) which has maximum probability. 

M m . Micrometer; the millionth part of a meter. Formerly called micron. 

Rhyolite . Fine-grained, light colored lavas and other igneous rock bodies 
relatively rich in potassium, sodium, and silica, and low in calcium, 
magnesium, and iron. 
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Solifluction . The process of slow downhill flowage of masses of soil and uncon- 
solidated surface debris saturated with water. 

Till . Nonsorted, nonstratified sediment carried or deposited by a glacier. 

Tuff . A rock formed of compacted volcanic ash and other fragments generally 
smaller than 4 mm in diameter. 
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Figure 1. 
National 



Index map of United States showing location of Yellowstone 
Park (shaded) and the test site (black). 




Figure 2. Panorama photo of test site looking west from near east edge, 
left edge. 


Crescent Hill is on the 
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Figure 3. Diagram of optical-mechanical scanner and spectrometer used 
by the University of Michigan in gathering data for this study. 
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Figure 4. Bedrock exposure of basalt lava flows. 
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Figure 5* Talus of rhyolite tuff at Floating Island Lake. Crescent 
Hill is in the background. 



Figure 6. Blocks of rhyolite tuff in talus showing contrast between 
fresh surfaces (below hammer head) and surfaces coated with dark 
lichens, which is what the scanner records. 
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Figures 7 and 8. Vegetated rock rubble. These are mixtures of angular 
blocks of basalt (fig. 7), bedrock slabs and blocks of gneiss (fig. 8), 
lichens, soil, dry grass, sagebrush, weeds, evergreen seedlings, and 
twigs . 
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Figures 9 and 10. Glacial kame meadow, showing grass, mineral soil 
weeds, dead vegetation, elk manure (fig. 9) and sagebrush debris 
(fig. 10). 


Figure 10 











Figures 11, 12, and 13. Glacial till, 
showing sand, rock chips, and boulders 
in mineral soil, grass, sagebrush, weeds 
and twigs. Wide range in texture is 
shown: fine-grained (fig. 11), mixed 

(fig. 12), and coarse-grained (fig. 13). 
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Figure l4. Gray-scale video display of radiance from channel 9 (0.62-. 66 m). Area is same as shown 

in eastern parts of figures 26 thru 30. 







■'SVi: 


Figure 1 6 . Example of density slicing (middle) and contouring (bottom) of originally continuous tones 
(top) of thermal infrared imagery. Area overlaps and extends east (right) of Figure lU. 




Figure 17. Color coded quantized thermal infrared image. Temperatures , from coldest to warmest, are 
shown on original color print by white, blue, yellow, red, green, and black, but are portrayed by 
different shades of gray on this black and white copy. 


Figure 18. Radiance of water (W) and forest (F). Radiance (r) increases upward. A vertical line 
two standard deviations long, centered about the mean radiance, is shown for each of the 12 
spectrometer channels. 
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Figure 19 . Covariance diagram showing distribution of radiance data for 
four hypothetical classes of material (A-D) in channels 1 and 2 
(Rl, R2). 



Figure 20. Frequency diagram of covariance data of same four classes 
(A-D) and channels (Rl, R2) as in figure 19. The surface is that 
which bounds the distribution of data points, and is topologically 
equivalent to a probability surface. The cylinder indicates 
placement of threshold limits; for simplicity, it is shown for 
class A only. 
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Figure 21. Analog recognition display of the FOREST class (white). 



Figure 22. Ten-level gray-scale digital computer display of radiance from 
channel 9 (0.62 -. 66 >* m), as obtained by University of Purdue. Area 
shown is the bottom (south) half of that shown in Figure l4. 
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Figure 23. Histograms of reflectance of talus in channels 1, 2, and 5* 

The abscissa is relative radiance (brightness), increasing to the right. 
On this copy of the computer printout, the ordinate gives the number 
of resolution elements with a given relative radiance. Band width of 
each channel given in micrometers. 
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Figure 24. Comparisons of spectral reflectance of training areas of four classes of material: 
$ Talus, + Vegetated rock rubble, = Kame, * Forest. Reflectance or radiance, increasing 
upward, is shown for each of the 12 channels of the Michigan scanner data. A vertical line 
two standard deviations long, centered about the mean radiance, is drawn using alphanumeric 
symbols . 



Figure 25. Segment of terrain map obtained by using Purdue University's digital computer-selected best ^ 
set of four channels of radiance dat*. Symbols used to designate the terrain classes are: ^ 

. Bedrock Exposures, $ Vegetated rock rubble, - Glacial till meadows, : Bog, H Shadows, 8 Talus, Ul 

» Glacial kame meadows, / Forest, W Surface water, (Blank) Rejected. 







Figure 26. Segment of terrain map obtained by combining one thermal infrared and three reflective 
channels of data. Symbols used to designate the terrain classes are the same as in figure 25 
(water and bedrock are not present in this display). Because the scan angle of the thermal 
scanner was much narrower than that of the reflective scanner; this display covers only the 
middle strip of those shown in figures 25, 27, and 28. 
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Figure 28. Segment of terrain map obtained by using simulation of ERTS 4-channel scanner data. 
Symbols used to designate the terrain classes are the same as in figure 25 . 
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Figure 29. Spectral channel output versus scan 
angle for three classes, showing the scan 
angle functions. Each vertical bar is two 
standard deviations long, centered about the 
mean. 
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Figure 30. Transformed spectral channel output 
versus scam angle for three materials, showing 
mean values of combined (transformed) signatures 
Each vertical bar is two standard deviations 
long, centered about the mean. 









Figure 31. Segment of terrain map obtained by using the University of Michigan digital computer terrain 
classification programs with preprocessing by scan-angle transformation function using channels 2, 5, 
and 12 (table l). Symbols used to designate the terrain classes are shown in table 4. Unfortunately, 
this map could not be reproduced in color; therefore several different classes may be indistinguishable. 
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Figure 32. Segment of terrain map obtained by using the University of Michigan digital computer 
terrain classification programs with preprocessing by ratio transformation using simulated data 
of the ERTS 4-channel scanner. Symbols used to designate the terrain classes are shown in 
table 4. 
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Figure 3*+. Schematic diagram showing scanning of color infrared aerial 
film and the distribution of the density data from the three film 
layers in three-color space. Four successive resolution cells along 
the microdensitometer scan direction are shown. Hypothetical clusters 
A and B represent two classes whose means have coordinates indicated 
by the dashed lines. The arrow indicates the plotted position of 
density data for one of the resolution cells, which falls within the 
cluster of class A. 
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Figure 35 * Black and white copy of color infrared aerial film used in 
the clustering study (top image) showing two of the terrain classed 
derived by computer processing as white circles (deciduous trees) and 
squares (bedrock of basalt and amphibolite). 
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Figure 36. Segment of ground-control map, southeast corner of test site. 

Numbers indicate decimal proportion of classes present in each outlined 
area. Classes are indicated by letters: 

T glacial till meadow 
Ta talus 
F forest 

R vegetated bedrock rubble 
X bedrock 
W water 
S shadow 
B Bog 

The dark line is a road 
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Figure 37. Comparison of wavelength bands used in the computer studies. 

THERMAL OVERLAY: four channels were combined and overlaid by the 

computer to make a single set of data from three scanner systems, 
using one reflective and two thermal scanners. 

BEST SET: computer-selected best set of four reflective channels of 

data from the 12-channel scanner. 

ERTS 4: simulation of the ERTS 4-channel scanner data. 

ERTS 3: simulation of the ERTS 3 KBV camera data. 

FILM: use of the three emulsion layers of color infrared film as a 

3-band spectrometer for mapping by clustering techniques. 


